Goto

Collaborating Authors

 policy improvement





Multi-Step Generalized Policy Improvement by Leveraging Approximate Models Lucas N. Alegre 1, 2 Ana L. C. Bazzan 1 Ann Now é 2 Bruno C. da Silva 3 1

Neural Information Processing Systems

We introduce a principled method for performing zero-shot transfer in reinforcement learning (RL) by exploiting approximate models of the environment. Zero-shot transfer in RL has been investigated by leveraging methods rooted in generalized policy improvement (GPI) and successor features (SFs).


Policy Improvement using Language Feedback Models

Neural Information Processing Systems

First, by using LFMs to identify desirable behaviour to imitate, we improve in task-completion rate over strong behavioural cloning baselines on three distinct language grounding environments (Touchdown, ScienceWorld, and ALFWorld). Second, imitation learning using LFMs outperform using LLMs as experts to directly predict actions, when controlling for the number of LLM output tokens.